Japanese Kana-to-Kanji Conversion Using Large Scale Collocation Data
نویسندگان
چکیده
Japanese wad prucessa. cr the cvmputer rated in Japaz employs, input method through keyboard vole canbinxIwith Kay Ohmetic) character b Kaiji (ickogrcphi4 Chime) cirraier aynersiattedsvlogy. .71r key fret►. of Karkto-Kanji co► tersion technology is how to rase the wary cfthe cantersicn hough the hamophae pvcwsirg we hate so many homcplvnes kits pcpet., we sprat the mass cf our Karr-taKayi canersicn experiments which embo* dr homcialme processing using catnsite colloartion daft It is shown that ciprzimately 135,000 °goo:0m dai2yields 9.1 %rnise cfie amtersicn axunory ccmparedwith the protoope .Dstan which ha rro collocatiatcbta
منابع مشابه
Large Scale Collocation Data and Their Application to Japanese Word Processor Technology
Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we re...
متن کاملUsing Collocations and K-means Clustering to Improve the N-pos Model for Japanese IME
Kana-Kanji conversion is known as one of the representative applications of Natural Language Processing (NLP) for the Japanese language. The N-pos model, presenting the probability of a Kanji candidate sequence by the product of bi-gram Part-of-Speech (POS) probabilities and POS-to-word emission probabilities, has been successfully applied in a number of well-known Japanese Input Method Editor ...
متن کاملDiscriminative Method for Japanese Kana-Kanji Input Method
The most popular type of input method in Japan is kana-kanji conversion, conversion from a string of kana to a mixed kanjikana string. However there is no study using discriminative methods like structured SVMs for kana-kanji conversion. One of the reasons is that learning a discriminative model from a large data set is often intractable. However, due to progress of recent researches, large sca...
متن کاملCollocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor
Collocations, the combination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exemplifying an application to Kanji character decision processes for Japanese word processors. Unlike recent trials of automatic extraction, our collocations were collected manually through many years of intensive investigation of corp...
متن کاملKana-Kanji Conversion System with Input Support Based on Prediction
1 I n t r o d u c t i o n TOSHIBA developed the world's first Japanese word processor in 1978. Unlike languages based on an alphabet , Japanese uses /,housands of Ica nji characters of varying comp]exity. Hence, l,o arrange all of l~a'~:ii chm'acl;ers on keyboard is; difficult. On the other hand, kana dlaracters which are phonetic scripl,s of Japanese have 83 variations; these can be arranged o...
متن کامل